On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using correlated random walk approaches, but another quarto document includes models with background sampling pseudo absences. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.
The hypotheses I would like to test with these models are as follows:
H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.
study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?
H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.
study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?
H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.
study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?
H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.
study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?
H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.
study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?
Base models
These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.
base_Nspat_Ntag <- readRDS (brt_outputs[7 ])
# model performance
ggBRT:: ggPerformance (base_Nspat_Ntag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.5736928
Correlation 0.8285057
AUC 0.9664000
Per.Expl 58.6164539
cvDeviance 0.8622091
cvCorrelation 0.6620174
cvAUC 0.8766500
cvPer.Expl 37.8042183
#relative influence of predictors
ggBRT:: ggInfluence (base_Nspat_Ntag)
rel.inf
bathy_mean 24.425351
temp_mean 18.087655
sal_mean 10.919911
chl_mean 9.336627
vostr_mean 6.686538
ssh_mean 6.168030
mld_mean 6.048063
bathy_sd 5.587410
uo_mean 3.417308
vo_mean 3.364026
uostr_mean 3.297190
pred_var 2.661890
#explore partial plots
gbm.plot (base_Nspat_Ntag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Nspat_Ntag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 2 temp_mean 952.17
2 3 sal_mean 2 temp_mean 710.45
3 8 ssh_mean 2 temp_mean 512.92
4 6 vo_mean 3 sal_mean 417.59
5 10 bathy_mean 3 sal_mean 368.97
6 2 temp_mean 1 chl_mean 365.72
7 10 bathy_mean 8 ssh_mean 278.76
#predictive performance using test dataset
preds <- predict.gbm (base_Nspat_Ntag, dat_test_base,
n.trees = base_Nspat_Ntag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Nspat_Ntag)
#eval 75/25
eval_7525_modified (base_Nspat_Ntag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3096193 0.7932161 0.9479571 0.9934021 0.5428799 0.5861645
base_Nspat_Ytag <- readRDS (brt_outputs[8 ])
# model performance
ggBRT:: ggPerformance (base_Nspat_Ytag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.2305879
Correlation 0.9503452
AUC 0.9979000
Per.Expl 83.3664540
cvDeviance 0.4538959
cvCorrelation 0.8528183
cvAUC 0.9680300
cvPer.Expl 67.2580483
#relative influence of predictors
ggBRT:: ggInfluence (base_Nspat_Ytag)
rel.inf
tag 44.1894066
bathy_mean 17.5965698
temp_mean 11.0672136
sal_mean 6.8107330
chl_mean 4.6665128
ssh_mean 3.6780850
vostr_mean 3.5179667
mld_mean 2.5624157
bathy_sd 1.8533773
uostr_mean 1.2374256
vo_mean 1.1506546
uo_mean 1.0981552
pred_var 0.5714842
#explore partial plots
gbm.plot (base_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Nspat_Ytag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 1496.60
2 3 temp_mean 1 tag 1036.49
3 11 bathy_mean 1 tag 988.07
4 2 chl_mean 1 tag 512.43
5 9 ssh_mean 1 tag 369.40
6 11 bathy_mean 3 temp_mean 274.54
7 8 vostr_mean 1 tag 231.62
8 10 mld_mean 1 tag 207.08
#predictive performance using test dataset
preds <- predict.gbm (base_Nspat_Ytag, dat_test_base,
n.trees = base_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Nspat_Ytag)
#eval 75/25
eval_7525_modified (base_Nspat_Ytag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1923806 0.9265175 0.9917029 0.9942702 0.7937122 0.8336645
base_Yspat_Ytag <- readRDS (brt_outputs[9 ])
# model performance
ggBRT:: ggPerformance (base_Yspat_Ytag)
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.1773819
Correlation 0.9648516
AUC 0.9992000
Per.Expl 87.2044910
cvDeviance 0.3814392
cvCorrelation 0.8803459
cvAUC 0.9770700
cvPer.Expl 72.4847405
#relative influence of predictors
ggBRT:: ggInfluence (base_Yspat_Ytag)
rel.inf
tag 40.2206057
dist_coast 24.0608862
lat 9.4741336
temp_mean 4.9067106
bathy_mean 4.6525961
sal_mean 4.4004394
chl_mean 3.5004953
vostr_mean 2.4769759
ssh_mean 1.7393792
mld_mean 1.7041726
bathy_sd 0.8436843
uo_mean 0.5521552
vo_mean 0.5356691
uostr_mean 0.5120463
pred_var 0.4200505
#explore partial plots
gbm.plot (base_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
base_int <- gbm.interactions (base_Yspat_Ytag)
base_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 883.10
2 5 sal_mean 1 tag 512.27
3 4 temp_mean 1 tag 484.21
4 3 chl_mean 1 tag 478.25
5 14 dist_coast 1 tag 452.76
6 12 bathy_mean 1 tag 425.15
7 11 mld_mean 1 tag 201.34
8 14 dist_coast 2 lat 160.15
9 14 dist_coast 4 temp_mean 153.48
10 10 ssh_mean 1 tag 149.67
11 15 pred_var 1 tag 139.75
#predictive performance using test dataset
preds <- predict.gbm (base_Yspat_Ytag, dat_test_base,
n.trees = base_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_base$ PA, preds) #get % deviance
dat_pred_base <- cbind (dat_test_base$ PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1 ] == 1 , 2 ]
abs_base <- dat_pred_base[dat_pred_base[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_base, a = abs_base)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (base_Yspat_Ytag)
#eval 75/25
eval_7525_modified (base_Yspat_Ytag,
testInput = dat_test_base,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1713338 0.9418611 0.9946135 0.9957419 0.8323754 0.8720449
DO models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
do_0m_Nspat_Ytag <- readRDS (brt_outputs[14 ])
# model performance
ggBRT:: ggPerformance (do_0m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3408774
Correlation 0.9058550
AUC 0.9897000
Per.Expl 75.4108818
cvDeviance 0.5312851
cvCorrelation 0.8146489
cvAUC 0.9559100
cvPer.Expl 61.6758566
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Nspat_Ytag)
rel.inf
tag 40.9080062
bathy_mean 23.7019091
o2_mean_0m 14.1885697
temp_mean 4.2249870
chl_mean 3.4047577
sal_mean 2.9679120
ssh_mean 2.7533054
vostr_mean 1.6135297
mld_mean 1.4718496
bathy_sd 1.0656363
pred_var 1.0330993
uostr_mean 0.9371363
vo_mean 0.9056197
uo_mean 0.8236821
#explore partial plots
gbm.plot (do_0m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 1100.34
2 2 o2_mean_0m 1 tag 577.24
3 5 sal_mean 1 tag 519.34
4 4 temp_mean 1 tag 249.16
5 10 ssh_mean 1 tag 212.28
6 12 bathy_mean 4 temp_mean 154.44
7 3 chl_mean 1 tag 153.94
8 13 bathy_sd 1 tag 126.85
9 11 mld_mean 1 tag 117.52
10 14 pred_var 1 tag 104.95
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2392778 0.8810744 0.9811385 0.9981726 0.71633 0.7541088
do_0m_Yspat_Ytag <- readRDS (brt_outputs[15 ])
# model performance
ggBRT:: ggPerformance (do_0m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3169398
Correlation 0.9131868
AUC 0.9912000
Per.Expl 77.1376144
cvDeviance 0.4910827
cvCorrelation 0.8306125
cvAUC 0.9627200
cvPer.Expl 64.5758521
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_Yspat_Ytag)
rel.inf
tag 39.3243042
dist_coast 23.4675901
o2_mean_0m 9.2630160
bathy_mean 7.6602469
lat 7.4436658
temp_mean 2.1578941
chl_mean 2.1270429
sal_mean 2.0922701
ssh_mean 1.4504847
mld_mean 1.0484217
pred_var 0.9808561
vostr_mean 0.7765505
bathy_sd 0.6166353
uostr_mean 0.5730787
vo_mean 0.5270670
uo_mean 0.4908760
#explore partial plots
gbm.plot (do_0m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_Yspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 537.19
2 13 bathy_mean 1 tag 441.38
3 3 o2_mean_0m 1 tag 378.08
4 15 dist_coast 1 tag 190.11
5 6 sal_mean 1 tag 188.58
6 4 chl_mean 1 tag 144.71
7 11 ssh_mean 1 tag 112.11
8 5 temp_mean 1 tag 96.88
9 16 pred_var 1 tag 90.22
10 14 bathy_sd 1 tag 83.50
11 3 o2_mean_0m 2 lat 75.98
12 12 mld_mean 1 tag 69.54
13 5 temp_mean 2 lat 62.79
#predictive performance using test dataset
preds <- predict.gbm (do_0m_Yspat_Ytag, dat_test_do,
n.trees = do_0m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_Yspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2303471 0.8902957 0.9839578 0.9975502 0.7357788 0.7713761
do_0m_60m_Nspat_Ytag <- readRDS (brt_outputs[13 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3275756
Correlation 0.9101734
AUC 0.9904000
Per.Expl 76.3704047
cvDeviance 0.5211133
cvCorrelation 0.8185662
cvAUC 0.9575900
cvPer.Expl 62.4096016
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_Nspat_Ytag)
rel.inf
tag 39.7630992
bathy_mean 22.3644729
o2_mean_0m 13.1609254
o2_mean_60m 5.7538384
chl_mean 3.4719050
temp_mean 3.0814247
sal_mean 2.8796698
ssh_mean 2.4945921
mld_mean 1.4530477
vostr_mean 1.2564378
pred_var 1.0243164
bathy_sd 0.9583521
vo_mean 0.8437129
uostr_mean 0.8075706
uo_mean 0.6866347
#explore partial plots
gbm.plot (do_0m_60m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 1007.87
2 2 o2_mean_0m 1 tag 483.04
3 5 sal_mean 1 tag 376.66
4 14 o2_mean_60m 1 tag 244.01
5 4 temp_mean 1 tag 223.33
6 10 ssh_mean 1 tag 169.05
7 3 chl_mean 1 tag 158.22
8 13 bathy_sd 1 tag 133.89
9 12 bathy_mean 4 temp_mean 98.89
10 11 mld_mean 1 tag 89.75
11 7 uostr_mean 1 tag 85.56
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_60m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2355947 0.8847726 0.9820592 0.9988765 0.7244676 0.763704
do_0m_250m_Nspat_Ytag <- readRDS (brt_outputs[10 ])
# model performance
ggBRT:: ggPerformance (do_0m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3181841
Correlation 0.9124632
AUC 0.9911000
Per.Expl 77.0478579
cvDeviance 0.5109006
cvCorrelation 0.8216070
cvAUC 0.9592600
cvPer.Expl 63.1462878
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_250m_Nspat_Ytag)
rel.inf
tag 39.6887396
o2_mean_0m 16.8057119
o2_mean_250m 16.3627195
bathy_mean 11.6711360
temp_mean 2.7622574
chl_mean 2.2458410
sal_mean 2.1797729
ssh_mean 1.9088688
pred_var 1.1660577
mld_mean 1.0505143
bathy_sd 0.9624893
vostr_mean 0.9540213
uostr_mean 0.7842279
uo_mean 0.7318377
vo_mean 0.7258045
#explore partial plots
gbm.plot (do_0m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_250m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 624.45
2 2 o2_mean_0m 1 tag 515.76
3 14 o2_mean_250m 1 tag 451.56
4 5 sal_mean 1 tag 437.13
5 14 o2_mean_250m 2 o2_mean_0m 206.71
6 3 chl_mean 1 tag 190.92
7 10 ssh_mean 1 tag 169.35
8 4 temp_mean 1 tag 130.26
9 13 bathy_sd 1 tag 108.91
10 5 sal_mean 2 o2_mean_0m 103.02
11 15 pred_var 1 tag 95.95
#predictive performance using test dataset
preds <- predict.gbm (do_0m_250m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_250m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2338012 0.8864079 0.9826879 0.9979236 0.7298866 0.7704786
do_0m_60m_250m_Nspat_Ytag <- readRDS (brt_outputs[11 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3191965
Correlation 0.9122045
AUC 0.9909000
Per.Expl 76.9748229
cvDeviance 0.5103819
cvCorrelation 0.8220382
cvAUC 0.9594100
cvPer.Expl 63.1837098
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Nspat_Ytag)
rel.inf
tag 38.7855718
o2_mean_0m 16.7326988
o2_mean_250m 15.8676740
bathy_mean 11.1462711
o2_mean_60m 3.6391761
temp_mean 2.3942027
sal_mean 2.1301516
chl_mean 2.0440835
ssh_mean 1.6371297
pred_var 1.0443052
mld_mean 0.9945159
vostr_mean 0.8817324
bathy_sd 0.8374595
uostr_mean 0.7158327
vo_mean 0.6096311
uo_mean 0.5395639
#explore partial plots
gbm.plot (do_0m_60m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Nspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 569.59
2 15 o2_mean_250m 1 tag 463.02
3 2 o2_mean_0m 1 tag 458.51
4 5 sal_mean 1 tag 328.66
5 4 temp_mean 1 tag 178.30
6 3 chl_mean 1 tag 175.45
7 14 o2_mean_60m 1 tag 168.87
8 10 ssh_mean 1 tag 147.64
9 15 o2_mean_250m 2 o2_mean_0m 141.39
10 13 bathy_sd 1 tag 101.16
11 16 pred_var 1 tag 73.16
12 11 mld_mean 1 tag 72.52
13 7 uostr_mean 1 tag 68.16
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Nspat_Ytag, dat_test_do,
n.trees = do_0m_60m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Nspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2347064 0.88541 0.9824251 0.9994384 0.7287845 0.7697482
do_0m_60m_250m_Yspat_Ytag <- readRDS (brt_outputs[12 ])
# model performance
ggBRT:: ggPerformance (do_0m_60m_250m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3011707
Correlation 0.9188137
AUC 0.9925000
Per.Expl 78.2751108
cvDeviance 0.4820971
cvCorrelation 0.8345785
cvAUC 0.9641300
cvPer.Expl 65.2240256
#relative influence of predictors
ggBRT:: ggInfluence (do_0m_60m_250m_Yspat_Ytag)
rel.inf
tag 37.5967831
dist_coast 20.6808825
o2_mean_0m 10.5992826
o2_mean_250m 8.5196739
lat 4.6145739
bathy_mean 3.8293017
o2_mean_60m 2.9614725
temp_mean 1.9990685
chl_mean 1.8128857
sal_mean 1.6521470
ssh_mean 1.2297337
pred_var 1.0165713
mld_mean 0.8872307
bathy_sd 0.6138532
uostr_mean 0.5819161
vostr_mean 0.4989877
vo_mean 0.4618753
uo_mean 0.4437606
#explore partial plots
gbm.plot (do_0m_60m_250m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
do_int <- gbm.interactions (do_0m_60m_250m_Yspat_Ytag)
do_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 367.17
2 13 bathy_mean 1 tag 301.62
3 3 o2_mean_0m 1 tag 273.23
4 17 o2_mean_250m 1 tag 253.83
5 6 sal_mean 1 tag 176.41
6 4 chl_mean 1 tag 142.58
7 15 dist_coast 1 tag 137.30
8 11 ssh_mean 1 tag 106.92
9 16 o2_mean_60m 1 tag 100.63
10 5 temp_mean 1 tag 76.80
11 18 pred_var 1 tag 74.23
12 14 bathy_sd 1 tag 69.96
13 12 mld_mean 1 tag 67.22
14 9 vo_mean 1 tag 54.84
15 10 vostr_mean 1 tag 49.70
16 17 o2_mean_250m 3 o2_mean_0m 40.74
#predictive performance using test dataset
preds <- predict.gbm (do_0m_60m_250m_Yspat_Ytag, dat_test_do,
n.trees = do_0m_60m_250m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_do$ PA, preds) #get % deviance
dat_pred_do <- cbind (dat_test_do$ PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1 ] == 1 , 2 ]
abs_do <- dat_pred_do[dat_pred_do[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_do, a = abs_do)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (do_0m_60m_250m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (do_0m_60m_250m_Yspat_Ytag,
testInput = dat_test_do,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2273319 0.893137 0.9847825 0.9983867 0.7430137 0.7827511
AGI models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
agi_0m_Nspat_Ytag <- readRDS (brt_outputs[5 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3294417
Correlation 0.9120135
AUC 0.9914000
Per.Expl 76.2356104
cvDeviance 0.5331603
cvCorrelation 0.8132824
cvAUC 0.9557600
cvPer.Expl 61.5402963
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Nspat_Ytag)
rel.inf
tag 41.4342777
bathy_mean 22.5768731
temp_mean 8.8146014
AGI_0m 5.3409758
sal_mean 4.3539170
ssh_mean 4.0682555
chl_mean 3.4388349
vostr_mean 2.4504783
uostr_mean 1.5708916
mld_mean 1.3780963
bathy_sd 1.3161888
vo_mean 1.2188877
pred_var 1.0875914
uo_mean 0.9501306
#explore partial plots
gbm.plot (agi_0m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 1 tag 803.70
2 4 sal_mean 1 tag 650.52
3 13 AGI_0m 3 temp_mean 565.85
4 3 temp_mean 1 tag 323.97
5 9 ssh_mean 1 tag 312.45
6 11 bathy_mean 3 temp_mean 257.21
7 13 AGI_0m 1 tag 254.47
8 12 bathy_sd 1 tag 207.39
9 2 chl_mean 1 tag 203.19
10 10 mld_mean 1 tag 153.81
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2314333 0.890198 0.9838225 1.000695 0.7279205 0.7623561
agi_0m_Yspat_Ytag <- readRDS (brt_outputs[6 ])
# model performance
ggBRT:: ggPerformance (agi_0m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3079406
Correlation 0.9177214
AUC 0.9925000
Per.Expl 77.7865966
cvDeviance 0.4916562
cvCorrelation 0.8299703
cvAUC 0.9625900
cvPer.Expl 64.5342064
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_Yspat_Ytag)
rel.inf
tag 39.1704772
dist_coast 22.9067106
lat 9.2973311
bathy_mean 7.5281540
temp_mean 4.6964781
AGI_0m 4.1053374
sal_mean 2.6165565
chl_mean 2.4600045
ssh_mean 1.8019982
pred_var 1.1308360
mld_mean 0.9622477
vostr_mean 0.7571586
bathy_sd 0.7305617
uostr_mean 0.6250570
uo_mean 0.6156725
vo_mean 0.5954189
#explore partial plots
gbm.plot (agi_0m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_Yspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 646.34
2 12 bathy_mean 1 tag 375.27
3 5 sal_mean 1 tag 260.79
4 3 chl_mean 1 tag 187.34
5 4 temp_mean 1 tag 176.90
6 15 dist_coast 1 tag 162.12
7 14 AGI_0m 1 tag 154.37
8 13 bathy_sd 1 tag 134.66
9 16 pred_var 1 tag 101.32
10 12 bathy_mean 4 temp_mean 99.58
11 10 ssh_mean 1 tag 92.42
12 11 mld_mean 1 tag 67.39
13 9 vostr_mean 1 tag 61.50
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_Yspat_Ytag, dat_test_agi,
n.trees = agi_0m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_Yspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6350 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2229464 0.8984309 0.9864263 1.002449 0.746439 0.777866
agi_0m_60m_Nspat_Ytag <- readRDS (brt_outputs[4 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3298267
Correlation 0.9115959
AUC 0.9912000
Per.Expl 76.2078336
cvDeviance 0.5248204
cvCorrelation 0.8175566
cvAUC 0.9574200
cvPer.Expl 62.1418985
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_Nspat_Ytag)
rel.inf
tag 41.1130461
bathy_mean 22.1175206
temp_mean 8.8167602
AGI_0m 5.1923827
sal_mean 3.8146119
AGI_60m 3.5914584
ssh_mean 3.3646809
chl_mean 2.8805052
vostr_mean 2.2362877
mld_mean 1.3300028
uostr_mean 1.2641285
vo_mean 1.2078786
bathy_sd 1.1634177
pred_var 1.0421567
uo_mean 0.8651621
#explore partial plots
gbm.plot (agi_0m_60m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 1 tag 680.43
2 13 AGI_0m 3 temp_mean 677.26
3 4 sal_mean 1 tag 537.76
4 3 temp_mean 1 tag 412.50
5 14 AGI_60m 1 tag 247.42
6 13 AGI_0m 1 tag 229.24
7 9 ssh_mean 1 tag 222.42
8 11 bathy_mean 3 temp_mean 187.87
9 12 bathy_sd 1 tag 151.81
10 10 mld_mean 1 tag 151.59
11 2 chl_mean 1 tag 146.54
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2313699 0.8901793 0.9837147 1.000391 0.7282437 0.7620783
agi_0m_250m_Nspat_Ytag <- readRDS (brt_outputs[1 ])
# model performance
ggBRT:: ggPerformance (agi_0m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3250560
Correlation 0.9119144
AUC 0.9912000
Per.Expl 76.5519740
cvDeviance 0.5193933
cvCorrelation 0.8180693
cvAUC 0.9578900
cvPer.Expl 62.5333843
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_250m_Nspat_Ytag)
rel.inf
tag 40.4877590
AGI_250m 16.0481464
bathy_mean 11.8550705
temp_mean 11.3274832
AGI_0m 4.0294752
sal_mean 3.1191434
ssh_mean 2.8171750
chl_mean 2.4561101
vostr_mean 1.7373119
bathy_sd 1.1323632
mld_mean 1.0927830
uostr_mean 1.0841796
pred_var 1.0292176
vo_mean 1.0064972
uo_mean 0.7772848
#explore partial plots
gbm.plot (agi_0m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_250m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 628.92
2 11 bathy_mean 1 tag 513.82
3 14 AGI_250m 1 tag 395.22
4 3 temp_mean 1 tag 374.74
5 2 chl_mean 1 tag 242.62
6 9 ssh_mean 1 tag 218.77
7 13 AGI_0m 3 temp_mean 203.96
8 13 AGI_0m 1 tag 202.13
9 14 AGI_250m 3 temp_mean 136.10
10 12 bathy_sd 1 tag 120.44
11 15 pred_var 1 tag 112.61
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_250m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_250m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6450 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2300722 0.891287 0.9844429 1.001254 0.7325363 0.7655197
agi_0m_60m_250m_Nspat_Ytag <- readRDS (brt_outputs[2 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Nspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3194562
Correlation 0.9142194
AUC 0.9917000
Per.Expl 76.9559147
cvDeviance 0.5170729
cvCorrelation 0.8191713
cvAUC 0.9583200
cvPer.Expl 62.7007660
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Nspat_Ytag)
rel.inf
tag 39.9057995
AGI_250m 15.9521143
bathy_mean 11.5951719
temp_mean 11.0079998
AGI_0m 3.9165535
sal_mean 2.9778595
AGI_60m 2.5421021
ssh_mean 2.3136229
chl_mean 2.1326512
vostr_mean 1.7126189
uostr_mean 1.2278622
bathy_sd 1.0302718
mld_mean 1.0176906
pred_var 1.0118376
vo_mean 0.9572175
uo_mean 0.6986267
#explore partial plots
gbm.plot (agi_0m_60m_250m_Nspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Nspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 598.01
2 11 bathy_mean 1 tag 474.54
3 15 AGI_250m 1 tag 312.33
4 3 temp_mean 1 tag 311.72
5 2 chl_mean 1 tag 242.05
6 14 AGI_60m 1 tag 224.33
7 9 ssh_mean 1 tag 200.59
8 13 AGI_0m 1 tag 191.78
9 13 AGI_0m 3 temp_mean 179.03
10 12 bathy_sd 1 tag 122.19
11 15 AGI_250m 3 temp_mean 119.14
12 8 vostr_mean 1 tag 81.82
13 16 pred_var 1 tag 80.72
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Nspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_250m_Nspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Nspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Nspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2283 0.8930857 0.9848702 1.000961 0.7359933 0.7695591
agi_0m_60m_250m_Yspat_Ytag <- readRDS (brt_outputs[3 ])
# model performance
ggBRT:: ggPerformance (agi_0m_60m_250m_Yspat_Ytag)
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.2996914
Correlation 0.9203897
AUC 0.9931000
Per.Expl 78.3816567
cvDeviance 0.4880000
cvCorrelation 0.8312630
cvAUC 0.9631300
cvPer.Expl 64.7979495
#relative influence of predictors
ggBRT:: ggInfluence (agi_0m_60m_250m_Yspat_Ytag)
rel.inf
tag 38.8772398
dist_coast 19.6961943
lat 9.2401730
AGI_250m 7.9838041
bathy_mean 4.2759852
temp_mean 4.1810197
AGI_0m 3.2807915
AGI_60m 2.2096760
sal_mean 2.1899170
chl_mean 2.0352351
ssh_mean 1.2407309
pred_var 1.0135341
mld_mean 0.9384425
bathy_sd 0.6260702
vo_mean 0.5836817
uostr_mean 0.5717539
vostr_mean 0.5376097
uo_mean 0.5181415
#explore partial plots
gbm.plot (agi_0m_60m_250m_Yspat_Ytag, nplots = 10 , plot.layout = c (3 ,5 ), write.title = FALSE )
#find the 5 most important pairwise interactions
agi_int <- gbm.interactions (agi_0m_60m_250m_Yspat_Ytag)
agi_int$ rank.list
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 554.02
2 17 AGI_250m 1 tag 272.42
3 12 bathy_mean 1 tag 268.95
4 5 sal_mean 1 tag 205.22
5 3 chl_mean 1 tag 175.43
6 16 AGI_60m 1 tag 134.83
7 4 temp_mean 1 tag 134.10
8 15 dist_coast 1 tag 126.25
9 14 AGI_0m 1 tag 118.51
10 18 pred_var 1 tag 91.28
11 13 bathy_sd 1 tag 73.70
12 10 ssh_mean 1 tag 62.40
13 11 mld_mean 1 tag 54.65
14 8 vo_mean 1 tag 53.91
15 17 AGI_250m 2 lat 44.17
16 9 vostr_mean 1 tag 40.01
#predictive performance using test dataset
preds <- predict.gbm (agi_0m_60m_250m_Yspat_Ytag, dat_test_agi,
n.trees = agi_0m_60m_250m_Yspat_Ytag$ gbm.call$ best.trees,
type = "response" )
calc.deviance (obs = dat_test_agi$ PA, preds) #get % deviance
dat_pred_agi <- cbind (dat_test_agi$ PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 1 , 2 ]
abs_agi <- dat_pred_agi[dat_pred_agi[,1 ] == 0 , 2 ]
#evaluate (AUC, TSS, TPR)
e = evaluate (p = pres_agi, a = abs_agi)
plot (e, 'TPR' )
max (e@ TPR + e@ TNR - 1 ) #TSS
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt (agi_0m_60m_250m_Yspat_Ytag)
#eval 75/25
eval_7525_modified (agi_0m_60m_250m_Yspat_Ytag,
testInput = dat_test_agi,
gbm.y = "PA" )
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2208312 0.9003157 0.9869252 1.00278 0.7511082 0.7838166
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_crw_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
58.62
0.634
0.723
0.7550
0.948
0.3096
0.793
0.948
0.9923
0.586
base_0m_Nspat_Ytag
83.37
0.286
0.745
0.9199
0.992
0.1920
0.926
0.992
0.9940
0.834
base_0m_Yspat_Ytag
87.20
0.232
0.747
0.9360
0.995
0.1710
0.942
0.995
0.9960
0.872
do_0m_Nspat_Ytag
75.41
0.393
0.740
0.8480
0.981
0.2400
0.881
0.981
0.9980
0.754
do_0m_Yspat_Ytag
77.14
0.366
0.741
0.8580
0.984
0.2300
0.890
0.984
0.9970
0.771
do_0m_60m_Nspat_Ytag
76.37
0.382
0.749
0.8550
0.982
0.2360
0.885
0.982
0.9990
0.764
do_0m_250m_Nspat_Ytag
77.05
0.374
0.741
0.8570
0.983
0.2340
0.886
0.983
0.9980
0.770
do_0m_60m_250m_Nspat_Ytag
76.97
0.376
0.741
0.8530
0.882
0.2350
0.885
0.982
0.9990
0.770
do_0m_60m_250m_Yspat_Ytag
78.28
0.356
0.742
0.8650
0.985
0.2270
0.893
0.985
0.9980
0.783
agi_0m_Nspat_Ytag
76.24
0.377
0.741
0.8660
0.984
0.2310
0.890
0.984
1.0000
0.762
agi_0m_Yspat_Ytag
77.78
0.351
0.743
0.8780
0.986
0.2230
0.898
0.986
1.0020
0.778
agi_0m_60m_Nspat_Ytag
76.21
0.377
0.741
0.8620
0.984
0.2310
0.890
0.984
1.0000
0.762
agi_0m_250m_Nspat_Ytag
76.55
0.371
0.742
0.8650
0.984
0.2300
0.891
0.984
1.0000
0.765
agi_0m_60m_250m_Nspat_Ytag
76.96
0.366
0.742
0.8650
0.985
0.2280
0.893
0.985
1.0000
0.770
agi_0m_60m_250m_Yspat_Ytag
78.38
0.345
0.743
0.8790
0.987
0.2210
0.900
0.987
1.0000
0.784